title: Ecology of the Thymus Authors :Mary Browning, John Wares, Nancy Manley output: word_document ======================================================== ## Introduction
Within the field of organ biology, there is a lack of quantitative methods for analyzing the organization, as opposed to the composition, of organs, tissues, and cells. Without quantitative methods for assessing organization, a significant aspect of organ and tissue biology is inaccessible to analysis and contrast. Fortunately, a striking comparison can be drawn between the interactions that occur within organs and those that occur within ecosystems. For instance, cell types in an organ, like species in an ecosystem, are influenced by the availability of space and resources as well as by the presence of other species. The presence or absence of one type may alter the functional role and dominance of another, or the overall service the community provides to the larger system [e.g., @wooton05]. By comparing cell types within a biological organ to species in an ecosystem, quantitative methods used for ecological analyses can be directly applied. Through analysis of the exact spatial location and co-location of component cell types, in an approach that uses the analytical toolkit of community ecologists and biogeographers, we gain a better understanding of which interactions can be validated by independent approach and analysis, as well as which interactions or even cell types we are likely to be need more information about.
The co-distribution of constituent species in a community is often used to indicate interaction - whether competitive, trophic, or facilitative [@verberk11;@angel11]. By analyzing the exact spatial location and co-location of component cell types across the domain of a particular tissue or organ, we may gain a better understanding of which interactions - indicated by codistribution and sufficient density - can be validated by independent approach and analysis. To an extent, we recapitulate the spatial scales described above: we explore the differentiation of distinct habitats within the environment (biogeography: identifying regions of endemicity or dramatically shifted abundance), the colocalization of particular sub-groups of cells (community ecology), and we use the tenets of macroecology to the extent that we can generalize that cells need a particular density to be viable interactors, and to the extent that common cells have a larger overall distribution (this is not tautological, this follows from the first: you cannot be rare and widespread in a functional sense). Our overall goal is development of a cellular ecology, where we can understand interactions at a quantitative level that does not yet exist in developmental biology, particularly in this system. We seek to create a map of a healthy adult organ using cell location and abundance, and then to spatially characterize the organ by applying ecological theory directly to the produced map (something that has never been done before). The results from this analysis can be used to make comparisons with organs in states of disequilibrium, such as diseased or mutant organs, in order to see what organizational differences occur.
We performed this analysis on our model organ, the thymus. The thymus was chosen because it is an intriguing example of cellular level organization, with a strong connection between organization and function. The thymus consists of developing T cells supported by a complex cellular environment containing a variety of resident cell types, including thymic epithelial cells (TECs), dendritic cells, vasculature, and mesenchymal cells. These cell types comprise multiple microenvironments that direct and support thymocytes to develop from immature progenitors into mature T cells that are both self-tolerant and self-restricted. T cell development in the thymus requires interactions with the thymic microenvironments that provide signals for their survival, proliferation, and differentiation [1]. Despite their critical role in the generation of cellular immunity and the clinical importance of thymic regeneration, the composition and organization of thymic microenvironments and the mechanisms that promote their proper development and function are not fully understood, in part due to a lack of technical and theoretical approaches for quantifying tissue-level properties. Thus, the thymus has many characteristics that make it an excellent system for developing and testing quantitative modeling: a diverse cellular composition that can be identified with cell type-specific markers, regional organization that is required for maximal organ function, genetic models with diverse effects on organ composition and function, assays for experimentally inducing organ degeneration and regeneration, and high biomedical relevance. To date, there are no quantitative models of thymus organ structure and function and no established methods for generating one. We seek to address this by developing a quantitative theoretical model of the organization of cell types that can be used to better understand thymic function as well as evaluate diseased states. Our approach uses immunostaining on sagittal serial sections of a wildtype mouse thymus in order to identify distinct cellular subsets, followed by the use of novel computational approaches in order to quantitatively identify known and cryptic cellular spatial relationships. We then take this approach one step further by comparing a wildtype and a mutant (Aire knockout) mouse thymus in an effort to identify differences in organizational characteristics.
I suggest an explanatory box for readers so that quadrat, section, composite, etc. can all be explained clearly in one place
| First | row | 12.0 | Example of a row that spans multiple lines. |
| Second | row | 5.0 | Here’s another one. Note the blank line between rows. |
6b. bray curtis
6c. significant diffs between cell types?
also Andrew Sornborger’s approach for separating out the geometry/geography of clusters., and/or using connected component labeling to identify size, centroid, distance of centroids among clusters
jackknifing : removing one cell type at a time
CellProfiler outputs X&Y coordinates in an excel file (which also contains other unnecessary info). I delete the unnecessary info and add points ((0,0), (0,X), (X,0), (X,X)) to the X and Y columns in order to make a square so that the bin sizes in the matrix will be a consistent size. I save this file as a .txt. It is now ready to run through my script. The next step is to transform my data from individual .txt files into a community matrix that contains all the cell types. The community matrix is then used for Kmeans clustering.
Read in datafile generated from CellProfiler (1 cell type per file)
The next step is clustering. First, download and load these packages:
Next Step: Figuring out how clustering is done We can see that we’ve asked R to take cell count info from 11 cell types and make 4 clusters
It turns out that cluster 1 is just empty space - but not clusters 2-4 #need automatic way to identify empty outside of thymus
We can make a running sum of all the counts of all cell types by cluster
and normalize these so that each cluster type has the same total amount of cells (in other words, we’re getting the proportion of cell types in each cluster)
We can look at the difference between pairs of clusters to see which cell types are most different between clusters
The way to read the last plots is for something like “2-3”, positive bars mean that cell type is more common in cluster 2 (than 3); negative bars mean the opposite. The higher the absolute number on the y-axis, the greater the difference is in that cell count between the two clusters. I think (as John suggested), that we can move ahead and be more formal with this analysis (e.g. Bray Curtis dissimilarity), but this is a useful first step.
In R, the analogy to re-oarganizing data by either pivot tables, or manual calculation and pasting into a new sheet, is to manipulate one data frame to another. The ‘dplyr’ package or the ‘aggregate’ function can both help you here. For most standard ecology metrics there are usually one or more packages that make our life easier. Here the ‘vegan’ package is useful. In this example, we work out Bray-Curtis for all clusters (pairwise comparison)
df<-NULL df2<-NULL #these nulls are here to reset the dataframes from the rbind that is below, only important if iteratign across commands outside of KNITR
library(vegan) # you may need to install this if you don't have it
library(dplyr) # ditto (used for chaining and manipulation)
# http://blog.rstudio.org/2014/01/17/introducing-dplyr/
#df<-spe[which(spe.kmeans$cluster==1),] %>% colSums() %>% t() # example of chaining.
#reads as: take only cluster 1 data from 'spe', then take column sums (i.e. total cell type count)
#then transpose (swap rows/cols needed for bray curtis calc)
# each of these
df<-spe[which(spe.kmeans$cluster==1),] %>% colSums() %>% t()
for (j in 2:4){
tmp<-spe[which(spe.kmeans$cluster==j),] %>% colSums() %>% t()
df<-rbind(df,tmp)
}
df1.1<-spe1.1[which(spe.kmeans1.1$cluster==1),] %>% colSums() %>% t()
for (j in 2:4){
tmp1.1<-spe1.1[which(spe.kmeans1.1$cluster==j),] %>% colSums() %>% t()
df1.1<-rbind(df1.1,tmp1.1)
}
df1.2<-spe1.2[which(spe.kmeans1.2$cluster==1),] %>% colSums() %>% t()
for (j in 2:4){
tmp1.2<-spe1.2[which(spe.kmeans1.2$cluster==j),] %>% colSums() %>% t()
df1.2<-rbind(df1.2,tmp1.2)
}
df1.2<-spe1.2[which(spe.kmeans1.2$cluster==1),] %>% colSums() %>% t()
for (j in 2:4){
tmp1.2<-spe1.2[which(spe.kmeans1.2$cluster==j),] %>% colSums() %>% t()
df1.2<-rbind(df1.2,tmp1.2)
}
df2<-spe2[which(spe.kmeans2$cluster==1),] %>% colSums() %>% t()
for (j in 2:4){
tmp2<-spe2[which(spe.kmeans2$cluster==j),] %>% colSums() %>% t()
df2<-rbind(df2,tmp2)
}
df2.1<-spe2.1[which(spe.kmeans2.1$cluster==1),] %>% colSums() %>% t()
for (j in 2:4){
tmp2.1<-spe2.1[which(spe.kmeans2.1$cluster==j),] %>% colSums() %>% t()
df2.1<-rbind(df2.1,tmp2.1)
}
df2.2<-spe2.2[which(spe.kmeans2.2$cluster==1),] %>% colSums() %>% t()
for (j in 2:4){
tmp2.2<-spe2.2[which(spe.kmeans2.2$cluster==j),] %>% colSums() %>% t()
df2.2<-rbind(df2.2,tmp2.2)
}
df2.2<-spe2.2[which(spe.kmeans2.2$cluster==1),] %>% colSums() %>% t()
for (j in 2:4){
tmp2.2<-spe2.2[which(spe.kmeans2.2$cluster==j),] %>% colSums() %>% t()
df2.2<-rbind(df2.2,tmp2.2)
}
rownames(df)<-c("cluster1","cluster2","cluster3","cluster4")
df<-apply(df,2,as.integer) %>% as.data.frame()
df1.1<-apply(df1.1,2,as.integer) %>% as.data.frame()
df1.2<-apply(df1.2,2,as.integer) %>% as.data.frame()
df1.2<-apply(df1.2,2,as.integer) %>% as.data.frame()
df2<-apply(df2,2,as.integer) %>% as.data.frame()
df2.1<-apply(df2.1,2,as.integer) %>% as.data.frame()
df2.2<-apply(df2.2,2,as.integer) %>% as.data.frame()
df2.2<-apply(df2.2,2,as.integer) %>% as.data.frame()
#need same number of columns to bind them.
colnames(df2)<-colnames(df)
colnames(df1.1)=colnames(df2.1)
colnames(df1.2)<-colnames(df2.1)
colnames(df1.2)<-colnames(df2.1)
colnames(df2.1)<-colnames(df2.1)
colnames(df2.2)<-colnames(df2.1)
colnames(df2.2)<-colnames(df2.1)
# this is it column names were wrong
dfTOTAL<-rbind(df1.2,df2.2, df1.2, df1.2, df1.1, df1.1)
BrayCurtis<-vegdist(dfTOTAL,method="bray")
print(BrayCurtis)
1 2 3 4 5 6
2 0.38557395
3 0.51648705 0.47329744
4 0.88428144 0.80884080 0.88156887
5 0.77661590 0.65654765 0.76160080 0.36509240
6 0.10660718 0.46978867 0.50634645 0.88065923 0.75988731
7 0.52519610 0.52207402 0.30578512 0.93199912 0.85929563 0.57772343
8 0.29611554 0.39078297 0.37104872 0.88453208 0.76719428 0.35775784
9 0.00000000 0.38557395 0.51648705 0.88428144 0.77661590 0.10660718
10 0.38557395 0.00000000 0.47329744 0.80884080 0.65654765 0.46978867
11 0.51648705 0.47329744 0.00000000 0.88156887 0.76160080 0.50634645
12 0.88428144 0.80884080 0.88156887 0.00000000 0.36509240 0.88065923
13 0.00000000 0.38557395 0.51648705 0.88428144 0.77661590 0.10660718
14 0.38557395 0.00000000 0.47329744 0.80884080 0.65654765 0.46978867
15 0.51648705 0.47329744 0.00000000 0.88156887 0.76160080 0.50634645
16 0.88428144 0.80884080 0.88156887 0.00000000 0.36509240 0.88065923
17 0.90474029 0.84151561 0.90248248 0.15579323 0.45057822 0.90172508
18 0.24152736 0.24056192 0.39823050 0.84662698 0.69696417 0.31037532
19 0.56612100 0.47159700 0.11670743 0.86178535 0.72470910 0.56315836
20 0.08681984 0.37267991 0.51548573 0.86991459 0.77774298 0.15627012
21 0.90474029 0.84151561 0.90248248 0.15579323 0.45057822 0.90172508
22 0.24152736 0.24056192 0.39823050 0.84662698 0.69696417 0.31037532
23 0.56612100 0.47159700 0.11670743 0.86178535 0.72470910 0.56315836
24 0.08681984 0.37267991 0.51548573 0.86991459 0.77774298 0.15627012
7 8 9 10 11 12
2
3
4
5
6
7
8 0.45040199
9 0.52519610 0.29611554
10 0.52207402 0.39078297 0.38557395
11 0.30578512 0.37104872 0.51648705 0.47329744
12 0.93199912 0.88453208 0.88428144 0.80884080 0.88156887
13 0.52519610 0.29611554 0.00000000 0.38557395 0.51648705 0.88428144
14 0.52207402 0.39078297 0.38557395 0.00000000 0.47329744 0.80884080
15 0.30578512 0.37104872 0.51648705 0.47329744 0.00000000 0.88156887
16 0.93199912 0.88453208 0.88428144 0.80884080 0.88156887 0.00000000
17 0.94427108 0.90494885 0.90474029 0.84151561 0.90248248 0.15579323
18 0.45847325 0.24184646 0.24152736 0.24056192 0.39823050 0.84662698
19 0.35674682 0.40443364 0.56612100 0.47159700 0.11670743 0.86178535
20 0.54619257 0.35168054 0.08681984 0.37267991 0.51548573 0.86991459
21 0.94427108 0.90494885 0.90474029 0.84151561 0.90248248 0.15579323
22 0.45847325 0.24184646 0.24152736 0.24056192 0.39823050 0.84662698
23 0.35674682 0.40443364 0.56612100 0.47159700 0.11670743 0.86178535
24 0.54619257 0.35168054 0.08681984 0.37267991 0.51548573 0.86991459
13 14 15 16 17 18
2
3
4
5
6
7
8
9
10
11
12
13
14 0.38557395
15 0.51648705 0.47329744
16 0.88428144 0.80884080 0.88156887
17 0.90474029 0.84151561 0.90248248 0.15579323
18 0.24152736 0.24056192 0.39823050 0.84662698 0.87329543
19 0.56612100 0.47159700 0.11670743 0.86178535 0.88598080 0.44050306
20 0.08681984 0.37267991 0.51548573 0.86991459 0.89276893 0.19695374
21 0.90474029 0.84151561 0.90248248 0.15579323 0.00000000 0.87329543
22 0.24152736 0.24056192 0.39823050 0.84662698 0.87329543 0.00000000
23 0.56612100 0.47159700 0.11670743 0.86178535 0.88598080 0.44050306
24 0.08681984 0.37267991 0.51548573 0.86991459 0.89276893 0.19695374
19 20 21 22 23
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20 0.56439324
21 0.88598080 0.89276893
22 0.44050306 0.19695374 0.87329543
23 0.00000000 0.56439324 0.88598080 0.44050306
24 0.56439324 0.00000000 0.89276893 0.19695374 0.56439324
hc<-hclust(BrayCurtis)
plot(hc,labels=dfTOTAL$rownames)
To an extent, as K goes up it may be that spatial weighting is important, but we choose to avoid the problems that weighting brings up; it may also be possible that some of the areas defined as K approaches 10+ are legitimate microenvironments and this will be addressed in a later paper.